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Abstract 

Harald Racke [STOC 2008] described a new method to obtain hierar- 
chical decompositions of networks in a way that minimizes the congestion. 
Racke's approach is based on an equivalence that he discovered between 
minimizing congestion and minimizing stretch (in a certain setting) . Here 
we present Racke's equivalence in an abstract setting that is more general 
than the one described in Racke's work, and clarifies the power of Racke's 
result. In addition, we present a related (but different) equivalence that 
was developed by Yuval Emek [ESA 2009] and is only known to apply to 
planar graphs. 

1 Introduction 

In this manuscript we present results of a manuscript of Harald Racke titled "Op- 
timal hierarchical decompositions for congestion minimization in networks" [TB] . 
Our presentation is more modular than the original presentation of Racke in 
that it separates the existential aspects of Racke's result from the algorithmic 
aspects. The existential results are presented in a more abstract setting that 
allows the reader to appreciate the generality of Racke's result. Our presenta- 
tion is also more careful not to lose on the tightness of the parameters (e.g., not 
to give away constant multiplicative factors). For slides of a talk based on this 
manuscript see [TU] . 

Our manuscript is organized as follows. In Section [2] we discuss the opti- 
mization problem of min-bisection. Achieving an improved approximation ratio 
for this problem is one of the results of [TB] , and we use this problem as a moti- 
vation for the main results that follows. In Section [3] we present in an abstract 
setting what we view as Racke's main result, namely, an equivalence between 
two types of probabilistic embeddings, one concerned with faithfully represent- 
ing distances and the other with faithfully representing capacities. In Section Q] 
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we briefly discuss algorithmic versions of the existential result. In Section [5] 
we show how the machinery developed leads to an approximation algorithm 
for min-bisection. In Section [5] we present results related to the main theme 
of this manuscript, but that do not appear in [16| . These results concern an 
equivalence between deterministic embeddings in planar graphs, and were first 
developed and used by Yuval Emek in [5] . 

2 Min-bisection 

In the min-bisection problem, the input is a graph with an even number n of 
vertices. In the weighted version of the problem, edges have arbitrary nonneg- 
ative weights, whereas in the unweighted version, the weight of every edge is 1. 
A bisection of the graph is a partition the set of vertices into two sets of equal 
size. The width of a bisection is the total weight of the edges that are cut (an 
edge is cut if its endpoints are on different sides of the partition). Min-bisection 
asks for a bisection of minimum width. This problem is NP-hard. 

One line of research dealing with the NP-hardness of min-bisection offers a 
bi-criteria approximation. Namely, it is concerned with developing algorithms 
that produce a partition of the graph into nearly equal parts (rather than exactly 
equal parts), such that the width of the partition is not much larger than the 
width of the minimum bisection. The methodology used by these algorithms 
was developed in a sequence of papers and currently allows one to efficiently find 
a near bisection (e.g., each of the two parts has at least one third of the vertices) 
whose width is within a multiplicative factor of 0(^/Togn) of the width of the 
minimum bisection. The methodology used by these papers is related to the 
theme of the current manuscript. It also uses an interplay between distance and 
capacity. We briefly explain this interplay, and refer the readers to [21 [TSl H] 
for more details. 

For simplicity, assume that the input graph is a complete graph, by replacing 
non-edges by edges of weight 0. View the weight of an edge in a min-bisection 
problem as specifying its capacity. The width of a bisection is the total capacity 
of its associated cut. The first phase of the bicriteria approximation algorithm 
involves solving some linear program LP (in [14[ 115]) or semidefinite program 
SDP (in [4]). The output of this mathematical program can be thought of 
as a fractional cut in the following sense: edges are assigned lengths, and the 
longer an edge is, a larger fraction of it belongs to the cut. Naturally, for the 
fractional solution to have small value, the LP (or SDP) will try to assign short 
lengths to edges of high capacity. (This is a theme that will reappear in the 
proof of Theorem [6]) Thereafter, this fractional solution is rounded to give a 
near bisection of width not much larger than the value of the fractional solution. 
The rounding procedure is more likely to cut the long edges than the short edges. 
Or equivalently, two vertices of short distance from each other (where distance 
is measured by sum of edge lengths along shortest path) are likely to fall in 
the same side of the partition. Hence to find a near bisection of small capacity, 
the bicriteria approximation algorithms introduce an intermediate notion of 
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distance, and carefully choose (as a solution to the LP or SDP) a distance 
function that interacts well with the capacities of the edges. In fact, there is a 
formal connection between the distortion of this distance function in comparison 
to i\ distances and the approximation ratio (in terms of minimizing capacity) 
that one gets from this methodology. (See [TS] for an exact statement.) 

To move from a bicriteria approximation to a true approximation (in which 
the output is a true bisection, and approximation is only in the sense that the 
width is not necessarily optimal), it appears unavoidable that one should use 
in some way dynamic programming. Consider the task of determining whether 
a graph has a bisection of width 0. Such a bisection exists if and only if there 
a set of connected components of the graph whose total size is exactly n/2. 
Determining whether this is the case amounts to solving a subset sum problem 
(with sizes of connected components serving as input to the subset sum prob- 
lem), and the only algorithm known to solve subset sum (in time polynomial in 
the numbers involved) is dynamic programming. In [IT], the techniques used 
in the bicreteria approximation were combined with a dynamic programming 
approach to produce a true approximation for min-bisection with approxima- 
tion ratio 0((logn) 3 / 2 ) (obtained as the bicriteria approximation ratio times 
O(logrt)). Here we shall describe Racke's approach that gives a better approx- 
imation ratio, O(logn). 

Trees (and more generally, graphs of bounded treewidth, though this subject 
is beyond the scope of the current manuscript) form a family of graphs on which 
many NP-hard problems can be solved using dynamic programming. In partic- 
ular, min bisection can be solved in polynomial time on trees. This suggests 
the following plan for approximating min-bisection on general graphs, which is 
presented here using terms that are suggestive but have not been defined yet. 
First, find a "low distortion embedding" of the input graph into a tree. Then 
solve min bisection optimally on the tree, using dynamic programming. The 
solution will induce a bisection on the original graph, and the approximation 
ratio will be bounded by the distortion of the embedding into the tree. 

The plan as described above has certain drawbacks. One is that the dis- 
tortion when embedding a general graph into a tree might be very large (e.g., 
the distance distortion for an n-cycle). This problem has been addressed in a 
satisfactory way in previous work [2j [5] . Rather than embed the graph in one 
tree, one finds a probabilistic embedding into a family of (dominating) trees (the 
requirement that the trees be dominating is a technical requirement that will be 
touched upon in Section [5]), and considers average distortion (averaged over all 
trees). When the objective function is linear (as in the case of min bisection), 
the probabilistic notion of embedding suffices. Hence the modified plan is as 
follows. Find a low distortion probability distribution over embeddings of the 
input graph into (dominating) trees. Then solve min bisection optimally on each 
of these trees. Each of these solutions will induce a bisection on the original 
graph, and for the best of them the approximation ratio will be bounded by the 
average distortion of the probabilistic embedding into trees. 

The known probabilistic embeddings of graphs into trees are tailored to 
minimize average distortion, where the aspect that is being distorted is distance 
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between vertices. However, for the intended application of min bisection, the 
aspect that interests us is the capacity of cuts rather than distances. Hence 
Racke's approach for approximating min-bisection is as described above, but 
with the distinction that the distortion of embeddings is measured with respect 
to capacity rather than distance. 

To implement this approach, one needs to design probabilistic embeddings 
with low capacity distortion. Here is an informal statement of Racke's result in 
this respect. 

Theorem 1 For every graph on n vertices there is a probability distribution 
of embeddings into (dominating) trees with O(logn) average distortion of the 
capacity. Moreover, these embeddings can be found in polynomial time. 

This theorem is analogous to known theorems regarding the distortion of 
distances in probabilistic embeddings [9]. Racke's proof of Theorem [T] is by 
a reduction between these two types of embeddings. The existence of such a 
reduction may not be unexpected (as an afterthought), because as we have seen 
in the bicriteria approximation algorithms, there are certain correspondences 
between capacity and distance. 

As a direct consequence of the theorem above and the modified plan for 
approximating min bisection, one obtains an O(logrt) approximation for min 
bisection. This will be discussed in more detail in Section [5] 

3 An abstract setting 

In this section we present an abstract setting that as a special case will lead to 
the existential component of Theorem [TJ 

3.1 Definitions 

Let E be a set (of edges) and V a collection of nonempty multisets of E (that 
we call paths). A mapping M : E — ► V maps to every edge i 6 E a path 
P € V . It will be convenient to represent a mapping by a matrix M, where My 
counts the number of times the edge j lies on the path M(i). 

Spanning tree example. E is a set of edges of a connected graph G. 
Consider a spanning tree T of G, and let V be the set of simple paths in T. 
Then there is a natural mapping M that maps every edge (i,j) € E to the set 
of edges that form the unique simple path between vertices i and j in T. 

Tree embedding example. E is a set of edges of a connected graph G. 
Consider an arbitrary tree T defined over the same set of vertices as G (edges 
of T need not be edges of G). As in the spanning tree example above, there 
is a natural mapping from edges of G to paths in T. However, this is not a 
mapping in the sense defined above, because edges of T are not necessarily 
edges of G. To remedy this situation, we represent each edge of T by a set 
of edges that form a simple path between i and j in G. This representation is 
not unique (there may be many simple paths between i and j in G), and hence 
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some convention is used to specify one such path uniquely (for example, one 
may take a shortest path, breaking ties arbitrarily). Hence now each edge of 
T corresponds to a set of edges in G, and each simple path in T corresponds 
to a collection of several such sets. Now T can be the mapping that maps 
each edge in G to a multiset of edges of G that is obtained by joining 

together (counting multiplicities) the sets of edges of G that form the paths 
that correspond to the edges of T that lie along the simple path connecting i 
and j in T. 

Graph embedding example. This is a generalization of the tree embed- 
ding example. E is a set of edges of a connected graph G on n, and H is 
an arbitrary different graph defined on the same set of vertices. Now an edge 
G E is mapped to some path (using a convention such as that of taking 
a shortest path) that connects i and j in H, and this path in H is represented 
as a multiset of edges in G (as in the tree embedding example). This defines a 
mapping M. Natural alternative versions of the mapping reduce the multiset 
to a set, either by removing multiplicities of edges in the multiset, or by more 
extensive processing (e.g., if the multiset corresponded to a nonsimple path in 
G that contains cycles, these cycles may possibly be removed). 

Hypergraph example. E is the set of hyperedges of a connected 3- uniform 
hypergraph H . T is a spanning tree of the hypergraph H, in the sense that it 
is defined on the same set of vertices as H, and every edge of T is labeled 
by some hyperedge of H that contains vertices i and j. Then a hyperedge k 
can be mapped to the set of hyperedges that label the set of edges along the 
paths that connect i,j and k in T. 

We note that the tree embedding example is essentially the setting in Racke's 
work |16j . However the spanning tree example suffices in order to illustrate the 
main ideas in Racke's work. 

Let M. be a family of admissible mappings. A probabilistic mapping between 
E and M. is a probability distribution over mappings M G M. That is, with 
every M € M we associate some A M > 0, with J2m = 

We shall consider probabilistic mappings in two different contexts. 

Definition 2 (Distance mapping). Every edge i has a positive length £j 
associated with it. We let distM(i) denote the length of the path M(i), namely 
distM(i) = T] j M{j£j. The stretch of an edge i is dlst ^'^ , The average stretch 
of an edge in a probabilistic mapping is the weighted average ( weighted according 
to Am ) of the stretches of the edge. The stretch of a probabilistic mapping is the 
maximum over all edges of their average stretches. 

The stretch of a particular edge may be smaller than 1. However, the stretch 
of the shortest edge will always be at least 1. 

Probabilistic distance mappings were considered in [5J [71 [T] for the spanning 
tree example, and in [51 [S] for the tree embedding example. 

Definition 3 (Capacity mapping). Every edge i has a positive capacity Ci 
associated with it. We let loadM(j) denote the sum (with multiplicities) of capac- 
ities of edges whose path under M contains j . Namely, loadM(j) = Yli 
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The congestion of an edge j is oa AfW . The average congestion of an edge in 

Cj 

a probabilistic mapping is the weighted average ( weighted according to Xm) of 
the congestions of the edge. The congestion of a probabilistic mapping is the 
maximum over all edges of their average congestions. 

The congestion of an edge may be smaller than 1. However, the sum of all 
capacities in M(E) is at least as large as the sum of all capacities in E, implying 
that the congestion of a probabilistic mapping is always at least 1. 

For concreteness, let us present the notions of distance, stretch, load and 
congestion as applied to the spanning tree example. Consider a connected graph 
G in which every edge e = has a positive length £ e and a positive capacity 
c e . Consider an arbitrary spanning tree T of G, and the mapping from G to T 
described in the spanning tree example above. Then the distance of edge e is 
the sum of length of edges along the unique simple path that connects vertices 
i and j in T. The stretch of e is then the ratio between this distance and i e . 
The load on edge e is if e is not part of the spanning tree. However, if e is 
part of the spanning tree, the load is computed as follows. Removing e, the 
tree T decomposes into two trees, one containing vertex i (that we call Tj) and 
the other containing vertex j (that we call Tj). The load of e is the sum of 
capacities of all edges (including e itself) that have one endpoint in Tj and the 
other in Tj. The congestion of e is the ratio between the load and c e . 

3.2 Probabilistic mappings as zero-sum games 

We shall use the following standard consequence of the minimax theorem for 
zero sum games (as in [5]). 

Lemma 4 For every p > 1 and every family of admissible mappings M., there is 
a probabilistic mapping with stretch at most p if and only if for every nonnegative 
coefficients cti, there is a mapping M EM such thatY,Ui dl8t t* {l) < «i- 

Proof. Consider a zero sum game in which the player MAP chooses an admis- 
sible mapping M, and the player EDGE chooses an edge i. The value of the 
game for for EDGE is the stretch of i in the mapping, and hence EDGE wishes 
to maximize the stretch whereas MAP wishes to minimize it. A probabilistic 
mapping is a randomized strategy for MAP. Choosing nonnegative coefficients 
aii (and scaling them so that ^ ctj = 1) is a randomized strategy for EDGE. 

The "only if" direction. If there is no randomized strategy for MAP forcing 
an expected value at most p, then the minimax theorem implies that there must 
be a randomized strategy for EDGE that enforces an expected value more than 
p, regardless of which mapping player MAP chooses to play. 

The "if" direction. If there is no randomized strategy for EDGE forcing an 
expected value larger than p, then the minimax theorem implies that there must 
be a randomized strategy for MAP that enforces an expected value of at most 
p, regardless of which edge player EDGE chooses to play. ■ 
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Lemma 5 For every p > 1 and every family of admissible mappings M, there 
is a probabilistic mapping with congestion at most p if and only if for every 
nonnegative coefficients Pi, there is a mapping M £ M such that Y, [3 l loadM(l) < 

The proof of Lemma [5] is similar to the proof of Lemma [4j and hence is 
omitted. 

3.3 Main result 

Theorem 6 For every p > 1 and every family of admissible mappings M, the 
following two statements are equivalent: 

1. For every collection of lengths li there is a probabilistic mapping with 
stretch at most p. 

2. For every collection of capacities Cj there is a probabilistic mapping with 
congestion at most p. 

Proof. We first prove that item 2 implies item 1. 

Assume that there is a probabilistic mapping from E using M. with con- 
gestion at most p. By Lemma El for every nonnegative coefficients (3j, there is 
a mapping M <E M. such that fij load ™^ < pJ2Pi- Hence, using the nota- 
tion from Definition [3l for every nonnegative coefficients f3j we have a mapping 
satisfying: 

We need to prove that there is a probabilistic mapping from E using M. with 
stretch at most p. By Lemma |4l it suffices to prove that for every nonnegative 
coefficients a^, there is a mapping Me M such that J2 a * dlst ™^ < pJ2&i- 
Hence, using the notation from Definition [2l for every nonnegative coefficients 
at we need to find a mapping satisfying: 

i.jeE 1 i£E 

Choosing (3j — ctj and Ci — ctij ii (and likewise, Cj = ctj/£j) and substituting 
in inequality |T]), we obtain inequality 

The proof that item 1 implies item 2 is similar, choosing a,; = f3i and li — 
Pi/ci, and then inequality ^ becomes inequality ([T]). ■ 

Observe that the proof of Theorem[5]does not assume that entries of matrices 
M € M. are nonnegative integers (except for the issue that p is stated as a 
quantity of value at least 1). Neither does it assume that distances or capacities 
are nonnegative (though they cannot be 0, since the expressions for stretch and 
congestion involve divisions by distances or capacities). 
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3.4 Simultaneous stretch and congestion bounds 

Let M. be a family of admissible mappings for which there is a probabilistic 
mapping with stretch at most p. Hence by Theorem [SJ there is also a (possibly 
different) probabilistic mapping with congestion at most p. However, this does 
not imply that there is a probabilistic mapping which simultaneously achieves 
stretch at most p and congestion at most p. 

Consider the following example. The edges E are the set of edges of the 
following graph G. G has two special vertices denoted by s and t. There are 
\fn vertex disjoint paths between s and t, each with \fn edges. In addition, 
there is the edge (s,t). Hence altogether E contains n + 1 edges. The family 
M. of admissible mappings is the canonical family corresponding to the set of 
all spanning trees of G, as described in Section [3TT1 given an edge of E and a 
spanning tree T of G, the edge is mapped to the unique path in T joining its 
endpoints. 

Let £ be an arbitrary length function on E. The following is a probabilistic 
mapping into M. of stretch at most 3. Let P be the shortest path in G between 
s and t (breaking ties arbitrarily). It may be either the edge (s,t) or one of the 
\fn paths. For the probabilistic mapping, we choose a random spanning tree as 
follows. All edges of P are contained in the spanning tree. In addition, from 
every path P' =/= P, exactly one edge is deleted, with probability proportional 
to the length of the edge. 

Let us analyze the expected stretch of the above probabilistic mapping. For 
edges along the path P, the stretch is 1. Consider now an edge of length I in 
a path P 1 7^ P of length L. With probability [L — t)/L < 1 the edge remains 
in the random spanning tree, keeping its original length. With probability i/L 
the edge is not in the spanning tree, and then it is mapped to a path of length 
at most 2L — £ < 2L (we used here the fact that the length of P is at most L). 
Hence the expected stretch is smaller than 1 + 2£4i < 3. 

Let c be an arbitrary capacity function on E. The following is a probabilistic 
mapping into M. of congestion at most 3. For every path Pj between s and t 
(including the edge (s, t) as one of the paths), let denote the edge of minimum 
capacity on this path, and let Cj be its capacity. Choose a random spanning 
tree that contains all edges except the ej edges, and exactly one of the ej edges, 
chosen with probability proportional to its capacity. 

Let us analyze the expected congestion of the above probabilistic mapping. 
Consider an arbitrary edge e on path Pj. Its capacity c is at least Cj. The 
load that it suffers is at most its own capacity c, plus perhaps the capacity 
Cj < c of edge ej, plus with probability Cj/Q^Cj) < c/(J2i c i) a capacity of 
y~]i-j.j Ci < c «> wri ich in expectation contributes at most c. Hence altogether, 
its expected load is at most 3c. 

The two probabilistic mappings that we designed (one for stretch, one for 
congestion) are very different. In particular, the sizes of the supports are n^™ 
for the stretch case and y/n + 1 for the congestion case. We now show that 
if in the graph G all lengths and all capacities are equal to 1 (the unweighted 
case), every probabilistic mapping must have either stretch or congestion at least 
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y/n/2. To achieve stretch less than \/n/2, the edge (s,t) must belong to the 
random spanning tree with probability at least 1/2. However, whenever (s,t) 
is in the spanning tree, then from every other path connecting s and t one edge 
needs to be removed, contributing a total load of y/n to the edge (s, t). It is also 
interesting to observe that a random spanning tree (chosen uniformly at random 
from all spanning trees) contains the edge (s, t) with probability exactly 1 /2 (this 
is because the effective resistance between s and t is 1/2, details omitted). Hence 
the uniform distribution over spanning trees is simultaneously bad (distortion 
at least y/n/2) both for stretch and for congestion. 

We have seen that the probabilistic mappings that achieve low congestion 
may be very different than those that achieve low stretch. However, as we shall 
see in Section [6l for the special case considered here (that of spanning trees in 
planar graphs), there are additional connections between stretch and congestion, 
based on planar duality. 

4 Algorithmic aspects 

In Section[3]our discussion was concerned only with the existence of probabilistic 
mappings. Here we shall discuss how such mappings can be found efficiently. As 
we have seen in Section 13.21 and using the notation of Lemma IH the problem 
of finding a probabilistic mapping with smallest distortion can be cast as a 
problem of finding an optimal mixed strategy for the player MAP, in a zero 
sum game between the players MAP and EDGE. We shall briefly review the 
known results concerning the computation of optimal mixed strategies in zero 
sum games, and how they can be applied in our setting. Hence in a sense, this 
section is independent of Section [3] 

4.1 An LP formulation of zero sum games 

It is well known that the value of a zero sum game is a solution to a linear 
program (LP), and that linear programming duality in this case implies the 
minimax theorem. 

Consider a game matrix A with r rows (the pure strategies for MAP) and c 
columns (the pure strategies for EDGE), in which entry Ay contains the payoff 
for EDGE if MAP plays row i and EDGE plays column j. Map wishes to select 
a mixed strategy that minimizes the expected payoff, whereas EDGE wishes to 
select a mixed strategy that maximizes this payoff. 

With each row i we can associate a variable Xi that denotes the probability 
with which row i is played in MAP's mixed strategy. Then the linear program 
is to minimize p subject to: 

• ^ i AijXi < p for all columns j, 

• Y, x i = !j 

• xi > for all i. 
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An immediate consequence of this LP formulation is that an optimal solu- 
tion to the LP can be found in time polynomial in the size of the game matrix 
A (e.g., using the Ellipsoid algorithm). However, in our context of probabilistic 
mappings, it will often be the case that the size of A is not polynomial in the 
parameters of interest. For example, when mapping a graph G into a distribu- 
tion over spanning trees, the parameter of interest is typically n, the number of 
vertices in the graph. The number of edges (and hence number of columns in 
A) is at most n 2 and hence polynomially bounded in n. However, the number 
of mappings (number of spanning trees in G) might be of the order of n n ^ n ', 
which is not polynomial in n. Hence if one is interested in algorithms with run- 
ning time polynomial in n, one cannot even write down the matrix A explicitly 
(though the graph G serves as an implicit representation of A). 

Though the case that will interest us most is when there are superpolyno- 
mially many row strategies and polynomially many column strategies, let us 
discuss first the case when there are polynomially many row strategies and su- 
perpolynomially many column strategies. 

We note that in the discussions that follow we shall assume that all payoffs 
are rational numbers with numerators and denominators represented by a num- 
ber of bits that is polynomial in a parameter of interest (such as the smallest of 
the two dimensions of A). 

4.2 Superpolynomially many column strategies 

As we cannot afford to write the matrix A explicitly, we need to assume some 
other mechanism for accessing the strategies of the column player. Typically, 
this is viewed abstractly as oracle access. A natural oracle model is the following: 

Best response oracle. Given a mixed strategy for the row player, the 
oracle provides a pure strategy for the column player of highest expected payoff 
(together with the corresponding column of A) . 

If a best response oracle is available, one may still run the ellipsoid algorithm 
(with the best response oracle serving as a separation oracle) and obtain an 
optimal solution to the LP (and hence an optimal mixed strategy for the game). 
See [T3] for more details on this approach. 

4.3 Superpolynomially many row strategies 

Here we address the main case of interest, when there are superpolynomially 
many row strategies, but only polynomially many column strategies. One issue 
that has to be dealt with is whether an optimal mixed strategy for the row 
player can be represented at all in polynomial space, given that potentially it 
requires specifying probabilities for superpolynomially many strategies. Luckily, 
the answer is positive. Mixed strategies are solutions to linear programs, and 
linear programs have basic feasible solutions in which the number of nonzero 
variables does not exceed the number of constraints (omitting the nonnegativity 
constraints X4 > 0). Hence there is an optimal mixed strategy for the row player 
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whose support (the number of pure strategies that have positive probability of 
being played) is not larger than the number of columns. 

To access the pure strategies of the row player, let us assume here that we 
have a best response oracle for the row player. Now a standard approach is 
to consider the dual of the LP, which corresponds to finding an optimal mixed 
strategy for the column player. By analogy to Section 14.21 one can use the 
ellipsoid algorithm to find an optimal mixed strategy for the column player. 

An optimal solution to the dual LP has the same value as an optimal solution 
to the primal LP, but is not by itself a solution to the primal LP (and hence, we 
still did not find an optimal mixed strategy for the row player). However, there 
are certain ways of leveraging the ability to solve the dual LP and using it so as 
to also solve the primal LP. One such approach employs an exploration phase 
that finds polynomially many linearly independent constraints of the dual that 
are tight at the optimal dual solution, and then find an optimal primal solution 
that is supported only on the primal variables that correspond to these dual 
constraints. Details are omitted here (but presumably appear in |13|). 

4.4 Weaker oracle models and faster algorithms 

In the games that interest us, typically there are polynomially many columns 
(the column player is EDGE who can play an edge in the graph) and superpoly- 
nomially many rows (the row player Map has exponentially many mappings to 
choose from). In this respect, we are in the setting of Section |4~51 However, we 
might not have a best response oracle representation of MAP. Instead we shall 
often have a weaker kind of oracle. 

J-response oracle. Given a mixed strategy for the column player, the oracle 
provides a pure strategy for the row player (together with the corresponding row 
of A) that limits the expected payoff (to column player) to at most 5. 

Let p be the true minimax value of the game. Then the value of 8 for a 5- 
rcsponse oracle must be at least p, but in general might be much larger than p. 
If this is the only form of access to the pure strategies of the row player, finding 
an optimal mixed strategy for the row player becomes hopeless. Hence the goal 
is no longer to find the optimal mixed strategy for the row player, but rather 
to find a mixed strategy that limits the expected payoff of the column player 
to at most 5 (plus low order terms). We sketch here an approach of Freund 
and Schapire [12j that can be used. It is based on the use of regret minimizing 
algorithms. 

Consider an iterative process in which in each round, the column player se- 
lects a mixed strategy, the row player selects in response a ((-response (pure) 
strategy, and the column player collects the expected payoff of his mixed strat- 
egy against that pure strategy. If the column player is using a regret minimizing 
online algorithm in order to select his mixed strategies (such as using a multi- 
plicative weight update rule), then after polynomially many rounds (say, t), his 
payoff (which can be at most St) is guaranteed to approach (up to low order 
terms) the total payoff that the best fixed column pure strategy can achieve 
against the actual sequence of pure strategies played by the row player. This 
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means that if the row player plays the mixed strategy of choosing one of the t 
rounds at random and playing the row strategy that was played in this round, 
no pure column strategy has expected payoff significantly larger than S. For 
more details on this subject, the reader is referred to | 12| . or to surveys such 
as [5] or [3J. 

4.5 Implementation for probabilistic mappings 

Consider a zero sum game as in the setting of Lemma[3] EDGE has polynomially 
many strategies, whereas MAP potentially has exponentially many strategies. 
A best response oracle for MAP needs to be able to find a best response for 
MAP against any given mixed strategy of EDGE. In many contexts, finding 
the best response is NP-hard. However, for the intended applications, often 
a (5-response suffices, provided that one can guarantee that S is not too large. 
Indeed, given coefficients a, of the edges of the input graph, one can find a 
spanning tree with average stretch 0(log n) lj (the O notation hides some lower 
order multiplicative terms), and a tree embedding of average stretch 0(log n) [9] . 
This in combination with an algorithmic framework similar to that outlined in 
Section FOl gives probabilistic embeddings with stretch O(logn) and O(logn) 
respectively. 

The above results in combination with Theorem [6] imply that there is also a 
probabilistic mapping into spanning trees with congestion O(logn) and a prob- 
abilistic mapping into (arbitrary) trees with congestion O(logn). To actually 
find such a mapping algorithmically, one needs to find a mixed strategy for the 
player MAP in the corresponding zero sum game. The algorithmic framework 
of Section 14.41 shows that this can be done if we can implement a (S-response 
oracle for MAP. We have already seen that such an oracle can be implemented 
for distance mapping, but now need to do so for capacity mappings. Luckily, 
the proof of Theorem[6]can be used for this purpose. It shows how to transform 
any 5-response query to a capacity mapping oracle into a <5-response query to 
a distance mapping oracle. This establishes the algorithmic aspect of Theo- 
rem [1] We remark that the resulting probabilistic mappings have support size 
polynomial in n. 

5 Applications 

Racke describes several applications to his results, with oblivious routing being 
a prominent example. Here we concentrate only on one of the applications, that 
of min-bisection that served as our motivating example. 

Let G be a connected graph on an even number n of vertices, in which edges 
have nonnegative capacities. One wishes to find a bisection of minimum width 
(total capacity of edges with endpoints in different sides of the bipartization). 
We present here a polynomial time algorithm with approximation ratio 0(log n) . 

Consider an arbitrary spanning tree T of G. Every edge e = of T 

partitions the vertices of G into two sets that we call Tj and Tj . Define the load 
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loadxie) of edge e to be the sum of capacities of edges of G with one endpoint 
in Ti and the other endpoint in Tj. (This is consistent with Section |3~T1 ) 

Consider now an arbitrary bipartization B of G. Let Et{B) be the set of 
edges of T that have endpoints in different sides of the bipartization. Then 
the width of the bipartization is at most J2eeE T (B) loadr{e). (The load terms 
count every edge of G cut by the bipartition at least once and perhaps multiple 
times, and possibly also count edges of G not in the bipartization.) This is the 
domination property that we were referring to in Section 

By Theorem [TJ whose proof is summarized in Section [5J one can find in 
polynomial time a distribution over spanning trees of G such that for every 
edge of G, its expected congestion (over choice of random spanning tree) is at 
most S = 0(\ogn). 

Consider an optimal bisection in G, and let b denote its width. For each edge 
cut by the bisection, its expected congestion over the probabilistic mapping into 
spanning trees is at most 6. Summing over all edges cut by the bisection and 
taking a weighted average over all spanning trees in the probabilistic mapping, 
we obtain that at least in one such tree T, the width of this bipartization (with 
respect to the load in that tree) is at most 5b. 

The above discussion gives the following algorithm for finding a bisection of 
small width in a graph G whose minimum bisection has width b. 

1 . Find a probabilistic mapping into spanning trees with congestion at most 
6. (By the discussion above this step takes polynomial time, and 6 can 
be taken to be O(logn). Furthermore, the set of spanning trees in the 
support of the probabilistic mapping has size polynomial in n.) 

2. In each spanning tree, find an optimal bisection (with respect to the load) 
using dynamic programming. This takes polynomial time. Moreover, by 
the discussion above, in at least one tree the bisection found will have 
width at most 5b. 

3. Of all the bisections found (one per spanning tree), take the one that in G 
has smallest width. By the domination property, its width is at most Sb. 

The approximation ratio that we presented above for min-bisection is S — 
O(logn), rather than O(logn) as was done by Racke. To get the O(logn) 
approximation, instead of probabilistic mappings into spanning trees one simply 
uses probabilistic mappings into (arbitrary but dominating) trees. Then one can 
plug in the bounds of [5j rather than the somewhat weaker bounds of [T] and 
obtain the desired approximation ratio. Details omitted. 

6 Spanning trees in planar graphs 

In Section \3. 41 we saw that for distributions over spanning trees of planar graphs, 
the distributions achieving low stretch are very different from those achieving 
low congestion. In this section we present an interesting connection between 
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low stretch and low congestion for spanning trees in planar graphs. A similar 
connection was observed independently (and apparently, before our work) by 
Yuval Emek g]. 

The family of graphs that we shall consider is that of 2-connectcd planar 
multigraphs. Specifically, the graphs need to be planar, connected, with no 
cut edge (an edge whose removal disconnects the graph) , and parallel edges are 
allowed. In the context of spanning trees, restricting graphs to be 2-connected is 
not a significant restriction, because disconnected graphs do not have spanning 
trees, and every cut edge belongs to every spanning tree and hence does not 
contribute to the complexity of the problem. The reason why we allow parallel 
edges is so that the notion of a dual of a planar graph will always be defined. 
From every set of parallel edges, a spanning tree may contain at most one edge. 

Every planar graph can be embedded in the plane with no intersecting edges. 
In fact, several algorithms are known to produce such embeddings in linear time. 
This embedding might not be unique, in which case we fix one planar embedding 
arbitrarily. Given a planar embedding, the dual graph is obtained by considering 
every face of the embedding (including the outer face) to be a vertex of the dual 
graph, and every edge of the embedding corresponds to an edge of the dual 
graph that connects the two vertices that correspond to the two faces that the 
edge separates. (The fact that the graph has no cut edges insures that the dual 
has no self loops. Two faces that share more than one edge give in the dual 
parallel edges.) The dual graph is planar and the planar embedding that we 
associate with it is the one naturally obtained by the above construction. Under 
this planar embedding, the dual of the dual is the primal graph. Cycles in the 
primal graph correspond to cuts in the dual graph and vice versa. It is well 
known and easy to sec that given a spanning tree in the primal graph, the edges 
not in the spanning tree form a spanning tree in the dual graph. (This also 
gives Euler's formula that |V| - 1 + \F\ - 1 = \E\.) 

Consider now a length function on the edges of the primal graph. Given 
a spanning tree of the primal graph, for every spanning tree edge its stretch 
is 1, and for every other edge its stretch is determined by the length of the 
fundamental cycle that it closes with the spanning tree edges. In the dual 
graph, let the capacity of an edge be equal to the length of the corresponding 
edge in the primal graph. Consider the dual spanning tree. The congestion 
of edges not on the dual spanning tree is (one less than their stretch in the 
primal). The load of an edge on the dual spanning tree is precisely the sum 
of capacities of the corresponding fundamental cycle in the primal graph, and 
hence the congestion is exactly one more than the stretch in the primal. 

The above deterministic correspondence has the following probabilistic corol- 
lary. 

Corollary 7 Consider an arbitrary 2-connected planar graph G and its planar 
dual G. Assume that edges in G have nonnegative lengths whereas edges in 
G have nonnegative capacities, and moreover, the capacity of an edge in G is 
equal to the length of the corresponding edge in G. Then for every probabilistic 
mapping of G into spanning trees with stretch p, the same distribution over the 
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dual trees forms probabilistic mapping of G into spanning trees with congestion 
at most p+1. Likewise, for every probabilistic mapping of G into spanning trees 
with congestion p, the same distribution over the dual trees forms a probabilistic 
mapping of G into spanning trees with stretch at most p+1. 
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